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This paper represents one outcome from the Invitational Research Symposium on Technology - 
Enabled and Universally Designed Assessments, which examined technology-enabled 
assessments (TEA) and universal design (UD) as they relate to students with disabilities 
(SWD). It was developed to stimulate research into TEAs designed to make tests appro- 
priate for the full range of the student population through enhanced accessibility. Four 
themes are explored: (a) a construct-centered approach to developing accessible assess- 
ments; (b) how technology and UD can provide access to targeted knowledge, skills, and 
abilities by embedding access and interactive features directly into systems that deliver 
TEAs; (c) the possibility of incorporating scaffolding directly into innovative assessment 
items; and (d) the importance of investigating the validity of inferences from TEAs that 
incorporate accessibility features designed to maximize validity. The article conveys the 
issues arising through the symposium and offers insights to researchers who conduct 
studies on the design, development, and validation of technology-enabled and univer- 
sally designed assessments that include SWD. The paper proposes a focused research 
agenda and makes it clear that a principled program of research is needed to properly 
develop and use technology-enabled and universally designed educational assessments 
that encourage the inclusion of SWD. As research progresses, TEAs need to improve how 
they assess students’ understanding of complex academic content and how they provide 
equitable access to all students including SWD. 
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Overview 

The Invitational Research Symposium on Technology-Enabled and 
Universally Designed Assessments was held in Arlington, Virginia, on July 
23, 2009. Measured Progress and SRI International sponsored this meeting 
focused on the emerging and dynamic field of technology-enabled assess- 
ments (TEA) and the principles of universal design for assessment as they 
relate to students with disabilities. The symposium brought a group of 
researchers together from several areas of expertise including educational 
technology, cognitive psychology, students with disabilities, universal 
design for learning, and educational assessment. Among the participants 
were researchers who had completed or were engaged in research involving 
technology-enabled assessment, universal design for assessment, and/or 
students with disabilities, focused on two specific areas: cognition and 
access. The state of educational assessment and technology had recently 
been described in an article entitled Beyond the Bubble: Technology and the 
Future of Student Assessment (Tucker, 2009). Tucker drew attention to 
assessment challenges in the context of a cognitive model for assessment. 
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and envisioned a future for assessment and technology that resolved many 
of the challenges. Tucker’s article provided a foundation for the design 
of the symposium content, the meeting agenda, and motivated the plan 
to create a national research agenda that would capture the knowledge, 
expertise, and vision generated that day. 

To launch the symposium deliberations, four cutting-edge research 
initiatives were conveyed to symposium participants. Chris Camacho 
with Children’s Progress presented an adaptive and scaffolded assessment 
approach that provided prompts after incorrect responses and selected 
assessment items based on examinee responses to a previous question. 
Elizabeth (Boo) Murray, Center for Applied Special Technology, described 
an exemplar of universal design for learning. Strategic Reader that assesses 
maze and oral reading fluency via a web-based tool. Jody Clarke-Midura, 
Harvard Graduate School of Education, described immersive virtual per- 
formance assessments under development that are designed to assess 
knowledge and skills in science through items embedded within the con- 
text of virtual scenarios. Michael Russell of Nimble Assessment Systems 
demonstrated computer administered assessment tasks with embedded 
tools universally designed to facilitate access to content for students with 
special needs. 

The presentations demonstrating assessment and technology inno- 
vations were followed by a large-group dialogue and discussion between 
the presenters and participants about the research initiatives including 
unique challenges and particular innovations. This discussion delved into 
the target areas of cognition and access and surfaced insights and ques- 
tions arising from consideration of the future of TEAs. The debriefing ses- 
sion resulted in the large group dividing into two subgroups, one tackling 
issues regarding cognition and the other issues regarding access to assess- 
ment content. This article is based on the culmination of the symposium 
day plus the ongoing interactions among the subgroup members, who met 
to generate a research agenda regarding technology-enabled educational 
assessment and access for students with disabilities. 

Twelve participants joined the symposium subgroup that addressed 
access to assessment content and students with disabilities. The twelve 
members of the access group included five researchers from universities, 
three from assessment publishers, three from research institutes, and 
one researcher from a national technology center. The subgroup members 
communicated via email and telephone conference calls over a nine-month 
period following the symposium. Eleven of the participants were contrib- 
uting authors, writing components of this article. Three members of the 
symposium planning-team facilitated communication, assembled interim 
drafts, and assimilated revisions from subsequent reviews. Einal edits 
were assembled by the facilitators and reviewed by contributing authors 
prior to submission for publication. 
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Introduction 

The primary goal of this paper is to stimulate research into technology- 
enabled assessments (TEAs) that incorporate conditions designed to make 
tests appropriate for the full range of the student population through 
enhancing accessibility. We explore the concept of accessibility in TEAs, 
particularly as it applies to students with disabilities. In this context, four 
major themes related to access are explored. We begin with a description 
of a construct-centered approach to developing accessible assessments 
in which we emphasize the importance of preserving construct-related 
validity when developing methods for increasing access. The next two sec- 
tions of the paper focus on how technology can be used to provide access 
to the targeted knowledge, skills, and abilities (KSAs) and the role that 
universal design plays in increasing accessibility — in the second section, 
we discuss embedding access and interactive features directly into systems 
that deliver TEAs and, in the third section, we examine the possibility of 
incorporating scaffolding directly into innovative items. The final theme 
addresses the importance of investigating the validity of inferences from 
TEAs that incorporate accessibility features designed to maximize validity. 

This paper is aimed not only toward researchers but to policymakers as 
well. Recognition about how the opportunities for more accessible assess- 
ment can be afforded by technology-enabled assessment will bolsters sup- 
port for needed research. Researchers and educators have a responsibility 
to explain the promises and issues inherent in TEAs. With their knowl- 
edge, they can formulate strategies for an ongoing program of research 
that does not hinder the use of TEAs but rather strengthens the way they 
are used. A program of research can allow TEAs to realize their potential to 
improve education and provide better data about academic achievement, 
particularly for students with disabilities. 

Background 

Between 2001 and 2010, the No Child Left Behind Act (NCLB) solidi- 
fied the role of large-scale assessments in making summative judgments 
about student learning and school quality. Today, all 50 states and the 
District of Columbia administer annual tests to students across a wide 
range of grade levels. During this period, regulations stemming from 
NCLB and the Individuals with Disabilities Education Act (IDEA, 1997) 
and its amendments (2004) further solidified the importance of using 
assessment data from all students, including students with disabilities, in 
state and local accountability systems. Today, testing programs attempt to 
meet the needs of all students by providing a variety of test accommoda- 
tions, and, in some cases, developing and administering alternate tests 
that are aligned with grade-level content standards. Students with the 
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most significant cognitive disabilities participate in alternate assessments 
based on alternate achievement standards. Several states also have one 
or two additional alternate assessment options for some students with 
disabilities (alternate assessments based on modified achievement stan- 
dards, alternate assessments based on grade-level achievement standards) 
(Lazarus & Thurlow, 2009). 

According to Madaus, Russell, and Higgins (2009), large-scale sum- 
mative assessments are composed primarily of multiple-choice items. 
Many state tests, however, also include short open-response items and/or 
extended writing items. A few states also employ extended problem solving 
or inquiry tasks that may require students to produce written responses, 
create tables or graphs, and/or to produce drawings or diagrams. 

States are increasingly transitioning their assessment programs to 
computer-based administration (Tucker, 2009). Today, computers are used 
to administer either fixed-form tests that present items to students in a 
predetermined linear manner or adaptive tests that tailor the sequence of 
items presented to each student based on his/her response to prior items. 
Looking to the future, there is increasing evidence that states will continue 
to adopt technological solutions to enhance the efficiency and quality of 
their testing programs. In fact, in its Race to the Top Assessment Program, 
the U.S. Department of Education has launched a major initiative that 
could provide funding to develop and implement technology-enabled 
assessments that are more accessible for students with disabilities (U.S. 
Department of Education, April 2010). As an example, the SMARTER 
Balanced Assessment Consortium (2010), a consortium of 31 states, cites 
the use of computer technology within their assessment planning in sev- 
eral ways, for example, computer adaptive testing, computer-based simu- 
lations, and responses scored by computer. 

As they design technology-enabled assessments, it is likely that test 
developers will experiment with innovative item types that require stu- 
dents to interact with more information and to demonstrate deeper 
levels of knowledge and understanding by manipulating information pre- 
sented on a computer screen. Eor example, students may be required to 
manipulate digital representations of a microscope to locate and identify 
microscopic objects (e.g., an amoeba, a cell wall, mitochondria). Other 
items may require students to rearrange objects, such as line segments, to 
create shapes with specific characteristics (e.g., pentagon, perpendicular 
line, line of best fit). Students may also be presented with extended tasks 
that require them to conduct simulated experiments, such as separating 
a mixture into its separate compounds or determining the acidity of a 
substance. A task might involve searching for, selecting, and synthesizing 
information from a number of resources to support an interpretation of 
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an historical event. Still other tasks may require students to engage in 
role-playing activities within virtual worlds to solve complex problems, 
such as assuming the role of a biologist who is trying to determine why a 
kelp forest is shrinking. 

Access 

As innovative item types are integrated into large-scale testing pro- 
grams, it is vital that the needs of all students are considered during devel- 
opment so that TEAs are as accessible as possible to all students. Tucker 
(2009) notes, “New assessment models must not erode efforts to promote 
high expectations for all students” (p. 2). The goal underlying the design 
and use of TEAs is to obtain more valid inferences about the KSAs of stu- 
dents. For many students with disabilities and special needs, the validity 
of inferences is dependent on the accessibility of the items and tasks 
administered to them and with which they are required to interact. 

Accessibility is a desired characteristic of testing by which students with 
various physical, cognitive, sensory, or linguistic barriers are provided the 
opportunity to demonstrate the KSA intended for measurement — the tar- 
geted KSA (Winter, Kopriva, Chen, & Emick, 2006; Ketterlin-Geller, 2008; 
Beddow, Elliott, 8c Keller, 2009). As such, accessibility is a prerequisite to 
validity, the degree to which a test score interpretation is justifiable for a 
particular purpose and supported by evidence and theory (AERA, APA, 
8c NCME, 1999; Messick, 1989). Tests that require students to possess 
KSAs that are orthogonal to the intended constructs introduce construct- 
irrelevant variance into scores and hence compromise validity. A current 
example of reducing barriers and increasing accessibility is the provision of 
a read-aloud assessment administration for some students with learning 
disabilities who use recorded voice and/or text-to-ispeech accommoda- 
tions in the classroom. The challenge is that accessibility is not a static 
property of tests, but rather represents an interaction among test features 
and person characteristics that either permit or inhibit student responses 
to the targeted measurement content (Dolan 8t Rose, 2000; Winter et ah, 
2006). To continue the earlier example, a student may not benefit from a 
standard read-aloud version of a test if it is distracting to the student; in 
this case, another testing condition or accommodation such as a student- 
controlled text reader might be more beneficial. 
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Accessible Testing Through Universal Design 

To address the challenge of designing and delivering tests that are 
accessible to and accurate for a wide range of students, the principles of uni- 
versal design (UD) (Mace, 1991; Mace, Hardie & Place, 1996; see sidehar) 
have heen applied to the design, construction, and delivery of tests. The 
core tenet of UD is to create flexible solutions that avoid post hoc adapta- 
tion by considering from the start the diverse ways in which individuals 
will interact with their environment. Rose and Meyer (2000, 2002) cre- 
ated a pedagogical application of universal design, which is referred to as 
Universal Design for Learning (UDL). The influence of UD on assessment 
is found in the contribution of Dolan and Hall (2001, 2007) who proposed 
that tests be designed to minimize potential sources of construct-irrele- 
vant variance by supporting the ways that diverse students interact with 
the assessment process. Thompson, Johnstone, and Thurlow (2002, p. 1) 
adapted Mace’s original elements from architecture to derive seven ele- 
ments of accessible and fair tests: “(1) inclusive assessment population; 
(2) precisely defined constructs; (3) accessible, nonbiased items; (4) items 
amenable to accommodations; (5) simple, clear, and intuitive instructions 
and procedures; (6) maximum readability and comprehensibility; and (7) 
maximum legibility.” Ketterlin-Geller (2005, p. 5) provides a more generic 
definition of universal design for testing: an “integrated system with a 
broad spectrum of possible supports” that permits inclusive, fair, and 
accurate testing of diverse students. 
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Glossary: Universal Design 


Universal Design 

The universal design concept was founded in the 1 980s within the field of 
architecture by Ron Mace (1991; 1996) at North Carolina State University. 

The goal of universal design is to build structures and products that are 
inherently accessible by considering individuals' diverse needs from the outset 
(e.g., mobility and communication needs), thus reducing the need for retrofitting. 

As such, universal design seeks to minimize assumptions about how individuals 
will interact with what is being built. Television captioning provides a good example 
of universal design in practice. While originally intended for people with hearing 
impairments, who otherwise needed to retrofit their televisions by purchasing 
expensive decoder boxes to access the captions, captioning became standard and 
ubiquitous through legislation that called for building the feature into all televisions. 
This universal design feature now benefits not only those with hearing impairments, 
but far more individuals in health clubs, bars, and airports as well as individuals 
working on their language skills and couples who go to sleep at different times. 
Further, as a built-in feature, access to television captioning costs a few cents rather 
than several hundred dollars. 

Universal Design for Learning 

The educational framework of universal design for learning extends universal design 
from a physical space to a pedagogical space (Rose & Meyer, 2000, 2002), utilizing 
recent discoveries and advances in the cognitive sciences and digital technologies. 

It guides the design of curricula, materials, and assessments that are more accessible 
to most students, including those with disabilities and who are English learners. 
Flexibility is accomplished by accounting for individual differences in how students 
recognize, strategize, and engage in learning situations by providing the following: 

• Alternative formats for presenting information (multiple or transformable 
accessible media) 

• Alternative means for action and expression (writing, drawing, speaking, 
switch, use of graphic organizers, etc.) 

• Alternative means for engagement (background knowledge, options, challenge, 
and support) By providing these alternatives in flexible and customizable ways, 
UDL seeks to minimize learning barriers and maximize learning opportunities. 


These applications of universal design share common elements. First, 
they propose a solution to accurate testing of the full student population, 
in particular students with disabilities and/or students who are English 
language learners. Second, they propose that test accessibility is best 
accomplished by considering the needs of all students from the beginning 
rather than trying to retrofit assessments later. Accommodations have 
been the typical solution to including students with disabilities in general 
assessment programs; when students receive appropriate accommoda- 
tions, they are able to more meaningfully participate in the assessment 
(Thurlow, Thompson, & Lazarus, 2006) and thus provide a better indica- 


I-T-L-A 


Technology-Enabled and Universally Designed Assessment 


11 


tion of their knowledge, skills and abilities. However, no matter how well 
designed, administered, and utilized they are, accommodations are post 
hoc retrofits. In addition, some accommodations may affect students’ 
performances differently on different items by introducing sources of 
construct-irrelevant variance or providing unintended construct-relevant 
information. As a result, they may fail to provide students adequate sup- 
port and can compromise validity (Dolan & Hall, 2007). Finally, optimal 
testing provides choice in testing conditions to test administrators and/ 
or students that takes into account the diverse ways in which construct- 
irrelevant challenges and disabilities are manifest and can be supported 
in ways that “one-size-fits-all” solutions rarely do (Rose & Meyer, 2000). 

Recent research has suggested that applying universal design prin- 
ciples during test development and delivery can indeed improve testing 
of students with disabilities. For traditional multiple-choice and short- 
response items, research has demonstrated that principles of UD can 
be used to make a more inclusive testing environment that reduces the 
need for accommodations (Dolan, Hall, Banerjee, Chun, & Strangman, 
2005; Lazarus, Thurlow, Lail, & Christensen, 2009; Russell, Hoffmann, 
& Higgins, 2009), and reduces construct-irrelevant variance (Johnstone, 
Bottsford-Miller, & Thompson, 2006). 

The goal of UD is not to create a single assessment condition that is 
accessible for all students (Rose & Meyer, 2000). Instead, a universally 
designed assessment will anticipate the variety of accessibility needs of 
potential students and build in methods that allow all students to access, 
engage with, and respond to test content in the most accessible manner 
possible. There are two important steps to developing a universally 
designed assessment. First, test content must be developed in a way that 
anticipates the different representational needs of students and represen- 
tational forms that meet those needs without violating the test construct. 
Second, the system employed to administer test items to students must 
be designed to flexibly alter the presentation, interaction, and response 
to items, and tailor access to alternate representations based on each stu- 
dent’s individual need. When successfully executed, a universally designed 
assessment shifts the consideration of student/test interactions from 
determining post hoc changes required and providing test accommoda- 
tions to a priori design and administration decisions and development 
of alternate representations during the item and test development stage 
(e.g., Kettler, Elliott, & Beddow, 2009; Ketterlin-Geller, 2008; Thompson, 
Johnstone, & Thurlow, 2002). 

UD can also guide the development and delivery of “innovative items,” 
those that use digital technologies to test students on greater depths of 
knowledge and skill than traditional items. For the reasons already stated. 
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it is imperative that such items he accessible, accurate, and fair for a diverse 
range of students. To the extent that such items involve novel interfaces 
and tasks, they potentially introduce new forms of construct-irrelevant 
variance. A framework and guidelines for applying UD principles to the 
creation of innovative TEA has recently heen proposed (Dolan et ah, 2006; 
Dolan, Rose, Burling, Harms, & Way, 2007) with emphasis on interac- 
tions between students and test features as a function of individual differ- 
ences in perceptual, linguistic, motoric, cognitive, executive, and affective 
processing during item presentation, strategic interaction, and response 
action. 

Access and Interaction between Student Characteristics 
and Test Features 

The term access is widely used in discussions of education policy and 
practice, particularly with regard to special student populations (e.g., stu- 
dents with disabilities, English learners). Recent research that examines 
the interaction between student characteristics and test or item features 
has appeared (e.g., Fuchs, Fuchs, Eaton, Hamlett, & Karns, 2000; Helwig, 
Rozek-Tedesco, & Tindal, 2002; Ketterlin-Geller, Yovanoff, & Tindal, 2007; 
Sato, Rabinowitz, Gallagher, & Huang, 2010), as has research evaluating 
the effects of test access on student performance (e.g., Abedi, Courtney, 
8f Leon, 2003; Dolan, Hall, Banerjee, Chun, & Strangman, 2005; Rivera & 
Stansfield, 2001; Tindal & Ketterlin-Geller, 2004). The definition of access 
and the enumeration of elements that characterize effective access are still 
emerging (Sato et al., in press). 

Each student has unique characteristics. When student characteristics 
cause construct-irrelevant item features to interfere with the student’s 
opportunity to demonstrate knowledge, skills, or ability, access decreases 
and in turn, the degree to which the response reflects the student’s 
achievement decreases. Item types intended to measure the same con- 
struct might require different processing by examinees, depending on the 
specific processing requirements of the item in terms of its complexity and 
structure (Messick, 1994; Pearson & Garavaglia, 2003; Russell, Goldberg, 
& O’Conner, 2003; Thissen, Wainer, & Wang, 1994). According to Pearson 
and Garavaglia (2003), while two items may be psychometrically equiva- 
lent, they may not be psychologically equivalent — the items may require 
students to access the content in different ways, subsequently affecting 
their processing. As a result, the items may measure either skills or knowl- 
edge that differ from the intended content (construct irrelevance) or may 
provide processing challenges that interfere with the student’s ability to 
fully demonstrate what he or she knows and can do (underestimation). 
Therefore, a degree of flexibility in test design and delivery may be neces- 
sary to best ensure that students with disabilities and others have access 
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to tested content, enhance comparability in scores (Ketterlin-Geller, 
2008), and create assessments that support more valid inferences across 
subgroups of students. 

Glossary: Access 


Access 

For the purposes of this paper, access in assessment depends on the interaction 
between construct-irrelevant item features and person characteristics that 
either permits or inhibits student response to the targeted measurement content 
(Winter, Kopriva, Chen, & Emick, 2006). 


Test developers and users have often resisted the idea of flexibility in 
presentation and administration of test items and tasks, since standard- 
ization in these areas has been used as a basis for making common infer- 
ences across students, situations, test forms, and other conditions that 
vary in testing (see AERA, APA, & NCME, 1999). However, recent consid- 
eration has been given to the idea of purposeful flexibility in test design 
and delivery that can support comparable and valid score-based inter- 
pretations across student populations and testing conditions (Marion 8t 
Pellegrino, 2006; Sato et al. 2010). In order to best ensure effective flex- 
ibility, the interaction between student characteristics and the features of 
the test itself need to be understood (Ketterlin-Geller, 2008; Marion & 
Pellegrino, 2006; National Research Council, 2001). 

Research revealing the nature of student-item interactions is emerging. 
For example, researchers have extended the assessment triangle (National 
Research Council, 2001) to develop frameworks that support under- 
standing of how students represent knowledge in a domain and the types 
of observations that demonstrate learning, including the interaction 
between student characteristics and assessment techniques (Ketterlin- 
Geller [2008] and Marion and Pellegrino [2006] offer more detail on this 
subject). Frameworks for understanding interactions between categories 
of test taker characteristics and features of the test itself, including both 
targeted and ancillary interactions that affect construct-irrelevant vari- 
ance in test scores, also have been developed (Mislevy 8c Haertel, 2006; 
Beddow, in press; Dolan, Burling, Harms, 8c Way, 2007). Additionally, sys- 
tematic error has been examined (e.g., Kopriva, Wiley, 8c Winter, 2004); 
that is, the systematic ways in which target skills may be contaminated, 
misunderstood, or distorted by specific task factors such that the error 
influences task performance and affects the accurate measurement of stu- 
dents’ knowledge and skills and the validity of interpretation of results. 
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In short, the concept of increasing accessibility to assessments for 
students with disabilities and others has been the focus of a number of 
recent efforts. Frameworks such as the ones mentioned can provide a start 
for organizing the findings around accessibility, identifying the principles 
that can be applied to assessments now and highlighting the issues that 
should be researched as accessibility is addressed in the design and devel- 
opment of TEAs. For the purposes of this white paper, we began with the 
assumption that students received adequate instruction in order to focus 
our discussion around accessibility. Students’ opportunity to learn (OTL) 
academic knowledge, skills, and abilities as well as methods for demon- 
strating these KSAs is prerequisite to assessing achievement. Strategies 
for evaluating and improving OTL for students with disabilities are beyond 
the scope of this discussion. 

A Construct-Centered Approach for 
Designing Accessible Assessments 

All good educational assessment design begins with the defini- 
tion of the targeted KSAs that the test intends to measure (Mislevy & 
Riconscente, 2006). There may be ancillary or non-targeted KSAs required 
for successful performance on particular tasks in an assessment, but they 
are not the focus of measurement. Ancillary KSAs can pose access barriers 
to some students. For example, one type of ancillary KSA requires that 
students be able to perceive the material being assessed (e.g., perceiving 
text on a page or computer screen). Some ancillary KSAs are prerequisites 
to the targeted KSAs. For example, some computation may be required in 
an assessment task that targets using the appropriate procedures for esti- 
mating the probability of a particular event. Both targeted and ancillary 
KSAs must be considered when designing an assessment, to uphold the 
validity of interpretations derived from test scores — the set of targeted 
KSAs or intended constructs must be held central (Kopriva et ah, 2004; 
Abedi et ah, 2003; Zhang et ah, 2009, 2010). To continue the example, 
the required computation may be kept simple so that it is more likely that 
the task is indeed measuring skill in the targeted area — determining the 
procedure for estimating probability rather than computation; in a con- 
structed-response item, the scoring rubric may ignore the accuracy of the 
computation and focus on characteristics of the procedure, or students 
may be provided with a calculator or other tool to assist with the compu- 
tational aspects of a task. Messick (1994) explained clearly the necessity 
for such a grounded approach based upon thorough understanding and 
explication of constructs: 
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A construct-centered approach would begin by asking what complex 
of knowledge, skills, or other attributes should be assessed, 
presumably because they are tied to explicit or implicit objectives 
of instruction or are otherwise valued by society. Next, what 
behaviors or performances should reveal those constructs, and 
what tasks or situations should elicit those behaviors? Thus, the 
nature of the construct guides the selection or construction of 
relevant tasks as well as the rational development of construct- 
based scoring criteria and rubrics (p. 17). 

Glossary: KSAs 


What are KSAs? 

KSAs refer to the knowledge, skills and abilities being assessed. 

• Focal (or targeted) KSAs. The primary KSAs being assessed. 

• Additional (ancillary or non-focal) KSAs. Additional KSAs that may be required 

for successful performance on an assessment item or task but are not the primary 
focus. Some additional KSAs can be supported by the principles of Universal 
Design (UD) and testing accommodations. 

What is the origin of the term KSA? 

Mislevy and Riconscente (2006, p. 62) attribute the phrase knowledge, skills, 
and abilities to industrial psychologists who use KSAs to refer to the targets of 
the inferences they draw. They borrow the term and apply it more broadly for 
assessment to "the nature of the targets of inference and the kinds of information 
that will inform them... ." 


Identifying targeted and ancillary KSAs is a two-step process. The 
first step is to prepare a clear statement of a test’s targeted KSAs that 
are associated, for example, with content standards of interest in a sub- 
ject matter domain. Demonstrating achievement on each standard likely 
requires multiple skills, some of which are targeted and some of which are 
ancillary. The identification of the ancillary KSAs is the second step in the 
design process. By identifying targeted versus ancillary KSAs that may be 
involved in measuring each standard, assessment designers are able to iso- 
late potential sources of construct-irrelevant variance. They are then able 
to design supports that assist students in overcoming barriers presented 
by the ancillary KSAs. The validity of the inference made about a student’s 
performance based on such a process is likely to be improved. 

While there are some approaches for identifying item/task-level 
KSAs, few focus explicitly on identifying ancillary KSAs particularly as 
they apply to the design of assessments for diverse learners. Important 
exceptions are Kopriva (2008) in the context of English language learners, 
Solano-Flores 8c Li (2006) and Stansfield (2003) in the areas of transla- 
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tion and item templates, and Abedi et al. (e.g., Abedi, Courtney, & Leon, 
2003; Abedi, Courtney, Mirocha, Leon, & Goldberg, 2005) and Sato et al. 
(in press) in linguistic modification. In the area of students with disabili- 
ties, DeBarger, Haertel, Villalba, and Colker (2009) and Cameto, Haertel, 
DeBarger, and Morrison (2010) have identified ancillary as well as focal 
KSAs. This approach has also been implemented in some alignment pro- 
cedures such as those for alternate assessments (Flowers, Wakeman, 
Browder, & Karvonen, 2007) and in Achieve ’s alignment review process 
(Rothman, Slattery, Vranek, & Resnick, 2002). 

Evidence-Centered Design as a 
Construct-Centered Approach 

Evidence-centered assessment design (ECD) is an approach to creating 
educational assessments in terms of evidentiary arguments built upon 
intended constructs, with explicit attention paid to the potential influ- 
ence of unintended constructs (Mislevy, Steinberg, & Almond, 2003). ECD 
accomplishes this in two ways. The first is by incorporating an overarching 
conception of assessment as an argument from imperfect evidence. This 
argument makes explicit the claims (the inferences that one intends to 
make based on scores) and the nature of the evidence that supports those 
claims (Hansen & Mislevy, 2008; Mislevy 8c Haertel, 2006). The second is 
by distinguishing the activities and structures involved in the assessment 
enterprise, in order to exemplify an assessment argument in operational 
processes. By making the underlying evidentiary argument more explicit, 
the framework makes operational elements more amenable to examina- 
tion, sharing, and refinement. Making the argument more explicit also 
helps designers meet diverse assessment needs caused by changing tech- 
nological, social, and legal environments (Hansen 8c Mislevy, 2008; Zhang 
et al., 2009). 

The ECD process involves five layers of activities. The layers focus in 
turn on the identification of the substantive domain to be assessed; the 
assessment argument; the structure of assessment elements such as tasks, 
rubrics, and psychometric models; the implementation of these elements; 
and the way they function in an operational assessment, as described 
below. 

1. Domain Analysis involves determining the specific content 
to be included in the assessment. Use of state content stan- 
dards and the pending common core standards are examples of 
domain analyses. 
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2. In Domain Modeling, a high-level description of the overall 
components of the assessment is created and documented. 
Design Patterns used hy Mislevy and Haertel (2006) exemplify 
this layer. 

3. The Conceptual Assessment Framework is developed. In this 
layer, the KSAs to he assessed, the evidence that needs to he 
collected, and the features of the tasks that will elicit the evi- 
dence are specified in minute detail. Ancillary KSAs that may 
he required to respond correctly to an assessment task hut are 
not the intended target of the assessment are also specified 
(for example, reading skills in a mathematics examination). By 
identifying these ancillary KSAs, construct-irrelevant variance 
can he minimized in item and task development — potential 
harriers created hy the ancillary KSAs can he removed or their 
effects reduced through the provision of appropriate access 
features. 

4. Implementation involves the development of the assessment 
items or tasks using the specifications created in the concep- 
tual assessment framework just described. In addition, scoring 
rubrics are created and the scoring process is specified. 

5. In Delivery, the processes for the assessment administration 
and reporting are created. 

Combining ECD and UD for Accessible, 
Construct-Centered, Technology-Enabled Assessments 

Neither ECD nor UD alone, as independent approaches to assessment 
design, can assure accessible assessments. BCD’s strength is its explica- 
tion of evidentiary arguments and processes for maintaining a focus on 
the intended target of measurement. However, the interactions between 
a student and an assessment are complex, especially when considering 
the diverse ways in which students approach learning, engage in instruc- 
tion, and express what they know and can do. For an ECD approach to 
succeed with a range of students, the interactions between test takers 
and test items must be taken into consideration. While the principles 
of UD help identify the barriers that can limit the performance of stu- 
dents with diverse learning needs and ways to overcome these barriers, 
UD-based generalizations about student abilities and challenges alone also 
are inadequate. Design decisions must be made carefully at the construct 
and item levels with attention to both the diversity of learners, as repre- 
sented in UD, and the need for evidence-centered design, as represented 
in ECD. This is especially true for technology-enabled assessments, where 
the opportunities for student and test interaction can be greater and are 
less understood. 
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As mentioned in the introduction, recently UD principles have heen 
applied to the development of flexible, new media-hased learning envi- 
ronments that support a wide range of learners, including those with 
disabilities and those who are English language learners (Rose & Meyer, 
2002; Dolan et ah, 2006). This elaborated, UD-based framework includes 
(1) test delivery considerations, (2) item content and delivery consider- 
ations, and (3) component content and delivery considerations. Alone, 
application of this three-tiered approach can help develop technology- 
enabled assessments that are likely to increase accessibility for a range of 
students. Adding an explicit construct-centered and validity argument- 
based approach toward assessment, such as that provided by ECD, to these 
efforts will strengthen their effects of accessibility. 

ECD and UD frameworks are being combined to support the design of 
science assessment tasks as part of an lES-funded project titled Principled 
Science Assessment Designs for Students with Disabilities that is being 
conducted by SRI International, the University of Maryland, and the 
Center for Applied Special Technology (CAST) (DeBarger et ah, 2009). In 
this project, the web-based PADI assessment design system is being aug- 
mented to explicate the types of processing (i.e., perceptual, linguistic, 
cognitive, motoric, executive, and affective) students engage in while 
interacting with test items and tasks. Test developers specify the targeted 
and ancillary KSAs and then select ways to increase accessibility on the 
items/tasks by supporting students’ performance on the ancillary KSAs. 
For example, if vocabulary is an ancillary KSA, the assessment designer 
could select task features from a list of features that support linguistic pro- 
cessing. These support features, if implemented during the assessment, 
would provide students with the vocabulary words needed to overcome 
barriers presented by language and symbols that might inhibit their suc- 
cessful performance on the targeted KSAs. For example, strategies to sup- 
port lack of technical vocabulary might include embedded support for key 
terms with a technical glossary, hyperlinks, or footnotes to definitions. 

Extending the application of UD and ECD frameworks to assessment 
design and task development for students with significant disabilities, 
Cameto and colleagues (2010) are conducting research under two Enhanced 
Assessment Grants funded by the U.S. Department of Education. These 
two projects, conducted in close collaboration with consortia of states, 
are implementing several layers of the ECD process, including domain 
analysis, domain modeling, specification of the conceptual assessment 
framework, the authoring of exemplar assessment items and tasks, and 
the design of the assessment delivery system. The principles of UD are 
implemented during the domain modeling and task authoring layers of 
the ECD process. The next steps for the projects combining UD and ECD 
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involve validation studies of the items and tasks developed using the com- 
bined UD and BCD frameworks. (You can view http://padi-se.sri.com/ for 
more details.) 

Embedded Features Designed to 
Facilitate Access 

Two strategies for changing tests and test items have heen used to 
increase the access students with disabilities have to achievement testing: 
testing accommodations and universal design for assessment (Tucker, 
2009). Testing accommodations involve changes to the standard materials 
and procedures employed to measure a given construct and are intended 
to decrease the effect that ancillary constructs have on a student’s test 
performance. Accommodations are intended to ensure access for indi- 
vidual students and address particular student needs (e.g., Hollenbeck, 
2002). Based on a comprehensive review of state testing program policies 
regarding accommodations, Thurlow et al. (2006) identified five catego- 
ries of test accommodations: (1) presentation (e.g., large print booklets, 
Braille, signing), (2) equipment and/or materials (e.g., magnifying glass, 
noise buffer), (3) response methods (e.g., scribes, keyboard, pointing 
devices), (4) schedule and timing (e.g., extended time, breaks, multiple 
test sessions), and (5) setting (e.g., separate room, individual administra- 
tion, carrel). Original conceptions of accommodations have often derived 
from paper-and-pencil testing formats. 

As explained in the introduction, the application of the principles of 
universal design during test design and development produces tests and 
administration procedures that provide flexibility and access for all stu- 
dents, including students with disabilities. There are at least four com- 
ponents of accessibility through universal design that can be applied to 
technology-enabled assessments: (1) flexibility in the way test content is 
presented, (2) flexibility in the way students engage with test content, (3) 
system compatibility with a variety of assistive technologies (e.g., touch 
screen, single switch devices, alternate keyboards such as Intellikeys, 
speech- to-text software), and (4) availability of alternate representations 
by presenting students with alternate versions of text-based content. 
Reading aloud content, translating text-based content into sign language 
or Braille, tactile representations of graphical images, symbolic represen- 
tations of text-based information, narrative representations of chemical 
compounds (e.g., “sodium chloride” instead of “NaCl”), and translating to 
a different language are all forms of alternate representations. Going for- 
ward in this paper, we refer to TEAs that incorporate universal design as 
technology-enabled and universally designed assessment (TE/UDA). 
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Mislevy, Wilson, Erickan, and Chudowsky (2001) indicate that alter- 
nate presentations change the form in which test content is presented 
to a student. At some point, altered presentations change the construct 
being assessed and before that point, they simply provide different ver- 
sions of the same test content. In some instances, altered presentation 
in TE/UDAs can involve changing contrast, sizing, spacing, and so on. In 
the area of accommodations, changes have been viewed as residing along 
a continuum (Tindal, 1998) from accommodations to modifications. 
Within the context of digital technologies, the distinction between accom- 
modated (same construct) and modified (different construct) becomes a 
multidimensional one. Within TE/UDA, there are numerous alterations, 
for example, changing the sensitivity of a mouse, making glossary defi- 
nitions available with the click of a mouse, highlighting salient content, 
and masking potentially distracting content. Applying universal design 
concepts to educational assessment using digital technologies opens addi- 
tional possibilities to increase usability for a diverse range of students, pro- 
viding solutions not readily available through traditional paper-and-pencil 
testing approaches (Bennett, 1999; Burk, 1999; Dolan 8t Rose, 2000). 

Technology-enabled testing can offer tools embedded into the assess- 
ment platform and reduce the need for after-the-fact accommodations. For 
example, instead of separately printed test booklets with enlarged print 
for students with reduced vision, a magnification tool can be embedded 
into the assessment delivery system; rather than relying on a teacher to 
read aloud test content to groups of students, students can independently 
have the computer provide read-aloud content on an individual, as-needed 
basis. In fact, a variety of digital technology features that promote access 
to the test content or student response can be embedded into the same 
testing program. Some of these features may be designed for and offered 
to all students, while others may be available only to students who need 
them because of their disabilities. For example, it is conceivable that a 
read-aloud tool on a science test may be available to all students, while a 
tool that translates words into sign language is not. In any case, applying 
universal design to TE/UDA offers the potential to allow all students to 
benefit from a testing environment that adapts to meet individual needs. 

Building on universal design principles, TEA delivery, item writing, 
test design, and test development procedures could include alternate rep- 
resentational forms of item content and allow for alternate representa- 
tional forms for student responses. Additional considerations for TE/UDA 
design, development, and delivery include the ability of access features 
that function analogously across operating systems and that function with 
all elements of the test, including the test items, directions, reference and 
formula sheets, calculators, and other digital technologies (e.g., protrac- 
tors, rulers, magnifiers). Additional considerations involve the interaction 
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between multiple features used simultaneously, the automatic recording 
of students’ use of particular features, as well as the option to support 
students’ offline research and analysis. 

Embedded Digital Technology Features for Accessibility 

Recent innovations and technological advances have permitted the indi- 
vidualization of specific access strategies without the demand for human 
resources that has been required for testing accommodations. Many com- 
puter based assessment applications such as the Kansas Computerized 
Assessment (http://www.cete.us/kap/), NimbleTools (http://nimbletools. 
com), TestNav (http://www.pearsonassessments.com/TestNav), and iTest 
(http://measuredprogress.org/) integrate common accessibility features 
into test delivery systems for general student use and provide additional 
access features that can be tailored to the test taking experience based on 
each student’s individual needs and the unique interface offered by the 
particular technology-enabled assessment. 

Once integrated into an assessment platform, these systems offer 
educators the opportunity to create individual student accessibility pro- 
files prior to the administration of an assessment. An accessibility profile 
specifies the presentation and interaction tool options, alternate repre- 
sentations, and alternate response methods required for a particular stu- 
dent. The test delivery system can employ student accessibility profiles to 
tailor the test administration, that is, access features and representational 
forms available to a particular student during assessment. Alternatively, 
access tools integrated into the system could be selected by the student as 
needed. 

In applying principles of universal design, some vendors, including 
Measured Progress, NimbleTools, and Pearson, are embedding access fea- 
tures that address task comprehension, interaction, and response into 
their assessment delivery systems. These features include magnification, 
high contrast, altered color contrast, masking, and alternate representa- 
tions of content such as verbal representations of text, tables, formulas, 
scientific notation, and graphics; Braille; signed English; American Sign 
Language; and other languages such as Spanish. Access features provide 
students with interaction and response options such as alternate key- 
boards, single-switch devices, writing supports, and speech recognition. 
Based on each user’s individual needs, assessments with a variety of access 
features can tailor the availability of the features to ensure that each stu- 
dent has access he or she requires without being distracted by tools that 
are not needed. Tailoring the assessment environment to meet students’ 
access needs allows students to more accurately demonstrate their KSAs, 
leading to more valid inferences about their achievement. 
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Examples of Technology-Enabled and Universally 
Designed Assessments 

It is possible to increase accessibility within the framework of an 
existing assessment program using UD assessment software. For example, 
in Kentucky, assessment regulations required that a student who used 
assistive technology regularly in the classroom must be provided the 
option to use the same software during the assessment if the assistive 
technology usage was documented in the student’s lEP. Because the state 
testing program was paper-based, online testing was offered as an accom- 
modation for students who met the regulation’s criteria. A subset of the 
existing paper test forms were delivered online via Measured Progress’ 
(Test system,^ which allowed students to use multiple assistive technology 
devices that were already in place in Kentucky schools on the assess- 
ment. This targeted approach was intended to drive the transition from 
paper-based to online assessment by building awareness and evaluating/ 
updating the technology infrastructure while increasing access for the stu- 
dents who would benefit the most from online testing. This example, for 
practical purposes, is at the lower end of the spectrum of TE/UDA pos- 
sibilities and describes a logical entry point for states that need to start 
with an existing item bank designed for paper-based assessment. Ideally, 
as states move forward, TEA platforms will integrate universally designed 
test content with universally designed assessment software during the 
design and development phase. 

Measured Progress’s iTest system provides a combination of built-in 
accessibility tools, such as high-and altered-color contrast and variable 
screen layouts, along with the ability to interact with third-party assistive 
technology products that can be either embedded or externally accessed. 
While built-in functionality and embedded tool bars allow all students to 
take advantage of a common set of accessibility tools, allowing access to 
external tools gives students with specific accessibility needs the option to 
utilize the same assistive technology software with their preexisting user 
profiles that are used in the classroom on a regular basis. Figure 1 pres- 
ents a basic multiple-choice item, but shows a combination of iTest tools 
plus the Kurzweil^ toolbar. The figure illustrates an example of an online 
test that interacts with third-party software/assistive technology. In the 
figure, the Kurzweil tool bar is the grey bar at the bottom of the screen. 
Above that is the iTest tool bar. On the right, there are tools for font size 
increase/decrease, for high contrast color palette options, and on-screen 
math tools. On the left, there are a strikethrough and highlighter tools. 
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Figure 1 : iTest with Access Tools and the Kurzweil Toolbar 



Source: Screen shot taken from Measured Progress's iTest system 
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Universal design can also be incorporated into innovative item for- 
mats. Figure 2 shows a science item that was designed and developed 
using UD guidelines (Dolan et ah, 2006). The guidelines were used to mini- 
mize the introduction of sources of construct-irrelevant variance and to 
inform the user interface design in relation to the various modes of stu- 
dent interaction. In addition to having test-based accessibility supports 
such as magnification and read-aloud tools, the item is accessible through 
the keyboard (i.e., does not require pointer control) and can even be made 
single-switch compatible. Accomplishing this level of accessibility post hoc 
is difficult. For TE/UDA, accessibility and the provision of access features 
must be considered during assessment and item design. 

Figure 2: Screen Shot Showing Universal Design with an Innovative Test Item 



Source: Screen shot provided by Pearson (Assessment & Information) 


I-T-L-A 








Technology-Enabled and Universally Designed Assessment 


25 


The Benefits of Built-in Features 

There is a need for a systematic approach to identifying individual 
access needs so that resulting interactions with testing features can be 
understood (Ketterlin-Geller, 2008). To date, testing accommodations 
have been selected individually for each student with a disability by the 
Individualized Education Program (lEP) team. Evidence indicates that 
when implemented with integrity, accommodations may result in differ- 
ential score improvements for students with disabilities (Kettler 8t Elliott, 
in press); though other studies have found contradictory and inconclusive 
results (Thurlow, Lazarus & Christensen, in press). However, there is also 
evidence to suggest that accommodations are not always implemented as 
prescribed or may be implemented with poor fidelity. When accommoda- 
tions are given to students without their input, some students dislike cer- 
tain testing accommodations and may refuse to use them during testing 
or resent having them provided during testing (Elliott & Thurlow, 2005). 
To the extent that accommodations are implemented with poor fidelity 
or that students dislike using accommodations, the prescribed accommo- 
dations will be ineffective (Ketterlin-Geller, Alonzo, Braun-Monegan, 8t 
Tindal, 2007). 

Recent research, however, suggests that many students are more 
willing to have their access needs addressed when access features are 
integrated into technology-enabled universally designed assessments. 
Recent implementations of TE/UDA found that nearly three times as 
many high school students opted to employ a “test accommodation” pro- 
vided within a technology-enabled universally designed assessment com- 
pared to a paper-based version of the same test (Russell, Hoffmann, 8t 
Higgins, 2009). These implementations also found that the influence of 
ancillary constructs (e.g., reading and mathematics skills) was decreased 
when students performed the test using the TE/UDA as compared to the 
paper-based version. Further research is needed, however, to examine the 
similarities and differences between testing accommodations employed 
during more traditionally paper-based assessments and technology- 
enabled and universally designed assessments that have access features 
integrated into the test delivery platform. 

Research is needed on several fronts, corresponding with the con- 
text of the old (accommodations) and the new (TE/UDA) approaches for 
increasing access when testing students with disabilities. Specifically, 
evidence is needed on the degree to which specific access strategies com- 
promise or improve the inferences that can be made based on test scores. 
Research is also needed on the methods for selecting access features for 
individual students with disabilities who have individual needs. As with 
accommodations, no student or group should be given an unfair advan- 
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tage over other students. Both test validity and equity need to he consid- 
ered in the research (Sireci, Scarpati, & Li, 2005). 

Using Scaffolding in Technology-Enabled 
Assessment 

“Scaffolding”^ has heen used as a term to describe providing supports 
for student learning for a number of years; a recent definition is “explicit 
and sequentially organized support and guidance about possible strategies” 
(NRC, 2001, p. 278). Most writing about scaffolding uses a concept akin to 
Vygotsky’s (1978) zone of proximal development (ZPD) in discussing how 
to develop, structure, and use scaffolds in instruction. The ZPD is the zone 
between a students’ actual developmental level and the student’s poten- 
tial developmental level; scaffolds modify the levels of learning opportuni- 
ties so that they lie between the two extremes and help students progress 
toward targeted levels of learning. In instruction, scaffolding is used to 
help students gain access to content or concepts by providing supports 
geared to their current learning and/or cognitive capabilities. 

Bruner and colleagues define scaffolding as “constructing simple, 
noiseless opportunities for a child to grasp the sense and reference of 
various signs” (Bruner, 1983, p. 336; Wood, Bruner, & Ross, 1976, cited 
by Bruner). According to Quintana, Krajcik, and Soloway (2002), scaffolds 
help students “do cognitive tasks that are just out of their current develop- 
mental and intellectual capability ... [and] guide and support learners, but 
in a way that learners still need to think about the work they are doing” 
(p. 3). In instruction, scaffolds are meant to be temporary supports — 
once the student has reached the targeted level of learning, the specific 
instructional scaffolds are no longer necessary (new scaffolds may be used 
to bridge the gap to the next level of learning, of course). The transient 
nature of scaffolding differentiates it from learning strategies that are 
meant to be more generalized, such as goal setting before reading or using 
organizational tools for note taking. 

Although scaffolding is a common instructional technique used with 
students with and without disabilities and a natural part of formative 
assessment (Shepard, 2005; Chin 8c Teou, 2009), it has not been used 
widely in formal summative assessment settings. Part of the reason for 
this is practical — scaffolding for paper- and-pencil tests is not straightfor- 
ward; part is because scaffolding changes the nature of what is being mea- 
sured, often making the item or task less difficult in terms of the targeted 
construct, and we have not had an incentive to devise ways of scoring scaf- 
folded tests that take into consideration the difference; and part is because 
large-scale summative assessments are based on a static model of knowl- 
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edge and skills and have not yet been much influenced by the increased 
understanding of how students learn. Scaffolding, if used appropriately, 
might allow us to better measure students’ knowledge and skills by pro- 
viding supports to students that allow them to respond to a task at the 
appropriate entry level. 

Technology allows us to build assessment tasks that provide students 
with the opportunity to use or not use construct-relevant supports when 
they encounter an item, in a manner similar to the use of hints in online 
homework systems or intelligent tutoring systems that present content 
and/or scaffolds adaptively as a function of ongoing evaluation of a stu- 
dent’s KSA compared against a model of the constructs to be learned 
(Woolf, 2008). Appropriately used, scaffolds allow students who would 
otherwise get the item wrong to demonstrate what they do know about 
the item/task content. We do not yet know how to use scaffolds or other 
supports in large-scale assessments. A number of ways of providing sup- 
port can be imagined, as follows: 

• A student selects an incorrect option on a multiple-choice item; 
that option is removed and the student selects a response from 
the remaining options. 

• A student provides an incorrect response; the student is given 
an item that cues the student to an appropriate strategy for 
approaching the construct of the initial item and is then 
presented with an item parallel to the initial item. 

• A student has the option to ask for a demonstration or item 
starter after being shown an item — for example, in a drag-and- 
drop flll-in-the-blank item, the student may request that one 
blank be filled in. In an item asking the student to perform a 
task, the student can request a demonstration. 

This use of scaffolds provides students who have partial understanding 
with the opportunity to respond to an item more fully. Scaffolds can allow 
us to maintain the same expectations for performance for all students 
while providing the opportunity for responses from students along the 
full range of the achievement continuum. Scoring rules will need to be 
carefully developed for scaffolded items to take into account whether and 
how a scaffold is used in a response. 

For this article, we differentiate scaffolds in assessment from universal 
design features and other features students may use to obtain access to 
test content and provide appropriate responses. A scaffold is purposely 
designed to affect the knowledge, skills, and abilities (KSAs) required to 
respond to a task in a defined way,^ while accommodations and access 
features are not expected to affect the targeted KSAs. If a student uses 
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scaffolds when interacting with an assessment task, we make an infer- 
ence about the student’s construct-related KSAs different from that for 
a student who does not use scaffolding; if a student uses an accommo- 
dation or tool provided with the test in responding to an item, we make 
the same inference about the student’s targeted KSAs as we would for a 
student who did not use the tool and provided an equivalent response. 
In short, accommodations and access features are designed to reduce or 
eliminate construct-irrelevant variance in test scores by removing bar- 
riers to responding to the task presented; scaffolds are designed to reduce 
construct-irrelevant variance in test scores by changing the task so that it 
accesses the students’ construct-relevant KSAs better than the task would 
without scaffolding. The examples above show how the item/task diffi- 
culty might change with the use of scaffolding while the targeted content 
assessed by the item/task stays the same. 

Developing summative assessment tasks that incorporate scaffolding 
could allow us to do a better job of measuring the KSAs of students whose 
performance is at the lower end of the achievement spectrum, including 
some students with disabilities and low-performing students without 
identified disabilities. Scaffolding could provide us with the same types 
of benefits attributed to the use of traditional computer-adaptive tests 
(CATs). One reason CATs are appealing is that they measure reliably across 
a wide range of achievement by presenting different items to students 
based on how they perform on previous items — if a student gets an item 
wrong, the student is presented with an item that will be easier for the stu- 
dent.^ Scaffolds can be based on relationships among concepts using tech- 
niques such as concept mapping so that appropriate scaffolds are supplied 
based on the nature of a student’s incorrect response. Scaffolded assess- 
ments, then, may allow for an approach other than the linear, difficulty- 
based approach of traditional CATs to provide opportunity to perform for 
students at lower achievement levels and thus may be able to provide more 
valid and instructionally relevant results. 

Scaffolding in Educational Technology 

Advances in educational technology have allowed for implementing 
instructional techniques in nontraditional platforms. Scaffolding is com- 
monly built into instructional software, and researchers have begun 
to systematize how scaffolds should be developed and used in instruc- 
tional technology. Quintana and colleagues (2004), for example, present 
a framework for incorporating scaffolding into instructional software. In 
a monograph describing research on adaptive technologies, Shute and 
Zapata-Rivera (2007) discuss how such technologies can provide support 
for learning and propose a framework for organizing these technologies. 
These frameworks, while developed for instruction, can provide a founda- 
tion for the principled use of scaffolds as part of TE/UDAs. 
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Scaffolding is also being used in technology-enabled short-and 
medium-cycle formative assessments.^ For example, Feng, Heffernan, 
and Koedinger (2009) describe the ASSISTment system, which incorpo- 
rates instructional assistance into its assessments. Children’s Progress 
has developed a computer-based formative assessment and instructional 
system for use in grades pre-K through 3 that is based on identifying a 
child’s zone of proximal development so that appropriate scaffolds can be 
presented if the student responds incorrectly to a task (Camacho, 2009). 
By nature, the scaffold affects the KSAs required to respond to the ques- 
tion, and thus must be taken into account in the overall results. The use 
of scaffolding is reflected in how each item is scored (correct without scaf- 
fold = 1 point, correct with scaffold = 0.5, incorrect with scaffold = 0) and 
then in the selection of the next item, resulting in a guided path tailored 
to the student’s cognitive support needs (Children’s Progress Academic 
Assessment [CPAA] Technical Report, 2009). Because the scaffold is pro- 
vided only when the student responds incorrectly, the scaffold is auto- 
matically removed over time, as the student becomes proficient. The CPAA 
blends scaffolding, which is cognitively based, with computer adaptive 
testing (CAT), which is difficulty based. Clarke (2009) described an immer- 
sive virtual performance assessment (IVPA) that is under development. 
The IVPA is a 3-D virtual environment, based on an authentic ecosystem. 
Students take on the identity of a scientist and engage in inquiry practices 
and problem solving. While performing their tasks, students can use scaf- 
folds such as asking virtual scientists for help. The IVPA is built around 
how students learn and demonstrate inquiry skills and is an example of 
using cognitive science in developing scaffolded assessments. Note that 
none of these systems change the expectations for performance through 
their uses of scaffolds. Scaffolds allow the students to demonstrate the 
degree to which they are meeting those expectations. 

Investigating the Validity of Inferences 
Made from Technology-Enabled 
Universally Designed Assessments 

All the issues discussed in this paper relate to improving the validity 
of inferences made from test scores by taking advantage of the opportu- 
nities TEAs afford to improve access to test items. Psychometric consid- 
erations influencing the valid use of assessment results (e.g., reliability, 
freedom from bias) that apply to paper- and-pencil assessments extend to 
technology-enabled assessments. However, such considerations and the 
approaches used to address them are likely to manifest themselves in dif- 
ferent ways in TEAs, especially those designed to improve accessibility 
for students with disabilities. Currently, the effects of accommodations 
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on student scores are typically studied in terms of whether the accommo- 
dation alters the targeted construct such that inferences about students 
using the accommodation are not comparable to those about students 
who do not use the accommodation (Abedi, Courtney, & Leon, 2003; 
Abedi, Hofstetter, & Lord, 2004; Sireci, Scarpati, & Li, 2005; Thompson, 
Blount, & Thurlow, 2002). Currently, the prevailing method for checking 
on the validity of accommodations relies on the interaction, or “differ- 
ential boost” hypothesis. This hypothesis asserts that both groups may 
benefit from the accommodation but that students with disabilities must 
receive larger performance gains (Sireci, et ah, 2005). It is unclear how this 
test applies in the context of TE/UDA where universal access to “accom- 
modations” features is a possibility. Claims to validity are strongest when 
threats to validity have been removed or reduced. For students with dis- 
abilities, access is arguably the most relevant threat to validity because a 
lack of appropriate access can contribute to construct-irrelevant variance, 
misrepresentation of students’ abilities, and construct underrepresenta- 
tion (Abedi, Courtney, & Leon, 2003; Abedi, Hofstetter, & Lord, 2004; 
Bielinski, Sheinker, & Ysseldyke, 2003; Elliot et al., 1999; Helwig, Rozek- 
Tedesco, Heath, & Tindal, 1999; Kopriva, Samuelson, Wiley, & Winter, 
2003; Sireci, Li, & Scarpati, 2003; Thurlow & Wiener, 2000). That is, inad- 
equate access could result in the measurement of abilities that are not 
related to the intended test content (construct irrelevance). Inaccessibility 
could allow the student’s disability to interfere with that student’s ability 
to fully demonstrate what he or she knows and can do, and subsequently 
the test results could misrepresent or underestimate the student’s tar- 
geted KSAs. Inadequate access also could affect the intended construct in 
that the assessment no longer sufficiently measures the targeted domain 
(construct underrepresentation) (Sato et ah, 2010). Therefore, providing 
students with disabilities with appropriate access is critical to ensuring the 
validity of the assessment and inferences drawn from assessment results. 

TE/UDAs offer functionality and flexibility to embed a broader range 
of tools in the assessment tasks presented to students than we can offer 
with after-the-fact accommodations to paper-and-pencil tests. The dif- 
ferences in how access features and accommodations are offered, and the 
impact that potential within-item variability across students could have 
on validity (construct and/or consequential), need to be considered during 
the design of the items and assessment. That is, there needs to be up-front 
(rather than after-the-fact) consideration of the potential for differential 
use of embedded access features across students or student groups for a 
given item or item set, and of how such differences might affect how we 
score the assessment and interpret its results. The impact of these poten- 
tial differences needs to be examined during the development and imple- 
mentation of the assessment in terms of the psychometric considerations 
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that influence the validity of assessment results. Thus, through the pro- 
cess of TE/UDA design, development, and implementation, systematic 
examination and verification of the technical quality of the assessment 
and its items need to occur. 

Research Questions 

In this article, we explored the concept of accessibility in technology- 
enabled assessment through four themes: 

• Employing a construct-centered approach to assessment 

• Embedding accessibility features into assessment systems as a 
means of applying the principals of universal design 

• Incorporating scaffolding directly into innovative assessment 
items 

• Investigating the validity of inferences from TEAs that 
incorporate accessibility features 

Although, the intended audience for this article consists primarily of 
researchers, the authors understand that policymakers play a key role in 
establishing research priorities. We believe that researchers and educa- 
tors recognize the need to understand the promises and issues inherent 
in TE/UDAs and that they can assist policymakers in setting goals for 
research into these tests. The proposed research questions are concerned 
with increasing the validity of inferences that can be drawn from TE/UDAs 
about the knowledge, skills, and abilities of students with disabilities and 
all students. We hope that this white paper and the research questions 
raised here will trigger an ongoing program of research that strengthens 
the possibilities and allows technology-enabled and universally designed 
assessments (TE/UDA) to provide better data about academic achieve- 
ment, particularly for students who, because of access needs, have not 
been able to demonstrate the full extent of their KSAs. These questions 
illustrate some of the areas where research is most needed. The questions 
and areas have been separated, but, undoubtedly, the research will address 
multiple questions or aspects of multiple questions within a single study. 

There is a need to compile and adapt existing item/task development 
procedures and research so that they can be used to guide the develop- 
ment of access-based TE/UDAs in a consistent, scalable, and cost-effective 
manner. Questions to be addressed include the following: 
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• What standards, guidelines, and procedures are available to 
guide the incorporation of access features into TE/UDAs so that 
the effects of ancillary KSAs are minimized? 

• Are there procedures or existing research that will assist in the 
simultaneous development of items/tasks that measure the 
same targeted KSAs but that allow for different approaches to 
the item/task, provide different types of representations of the 
tested content, or eliminate specific barriers to access? 

• Which access features can be integrated into the design of a 
test, built in to the test development phase, and which must be 
applied on a case-by-case basis, to preserve the validity of test 
score inferences? 

• How do student access needs change over time, and how can 
these changes be considered in the design of access features and 
selection procedures? 

Evidence-centered design and universal design were discussed as pro- 
cesses that can be used to define targeted and ancillary constructs in a 
principled manner. These procedures can inform how and when access 
features and scaffolds can be included with items/tasks and assessments. 
Other systematic processes to identify targeted and ancillary constructs 
need to be developed or derived from existing research. Additional areas 
for research include the following: 

• How do we validate processes for defining targeted and ancillary 
constructs? 

• How do we determine whether the use of scaffolded items 
creates a test that measures a multidimensional construct or 
multiple constructs? Are ancillary KSAs added when scaffolding 
is used? 

• Are there consistent variables within certain content areas and 
contexts that are the sources of ancillary KSA requirements? 

• What is the relationship between ancillary KSAs and disability- 
specific barriers in items/tasks? 

Features intended to increase accessibility are already being incorpo- 
rated into technology-enhanced assessments. Research on the effects and 
efficacy of specific access features is needed. Examples follow: 

• What are the most appropriate and efficient ways to 
individualize read-aloud features to meet specific student needs 
for reading assistance (e.g., chunking text)? Can synthetic 
speech be used in lieu of human voice-recorded audio without 
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sacrificing the validity of test score inferences? What is the 
effect of different voices? Similarly, what is the effect of different 
voices (such as those with accents similar to the student’s 
accent) and/ or giving students a choice of voices? 

• What kinds of tools can he huilt into tests that are appropriate 
for students with various physical disabilities (e.g., motor 
impairment that precludes the use of a mouse, vision 
impairment, deaf and hard of hearing)? What additional features 
should he made available for students with low-incidence access 
needs? 

The idea of “scaffolding” or branching is emerging as a possibility in 
technology enhanced assessments — these techniques might allow us to 
obtain a better measure of students’ KSAs through providing partial credit 
if students use hints or other construct-related supports. Research on the 
design and impact of scaffolds or similar supports is needed. The research 
will first need to address a broad range of questions such as the following: 

• Under what conditions are scaffolds and access features 
distinctly different or on a continuum? When does the reduction 
of cognitive load common to both techniques transition from 
having no effect on the targeted construct to affecting the 
targeted construct? 

• What types of models can be used to design scaffolded items 
and tasks, and how do they work in a summative assessment? 

Are models equally suitable for less-structured (e.g., literature, 
history) and well- structured (e.g., algebra, physics) domains? 

• How can we systematically categorize scaffolding in assessment 
(e.g., effects on item difficulty, effects on level of complexity)? 

• How do we design scaffolds to incorporate appropriate pathways 
to supports and subsequent items/tasks in scaffolded tests? How 
do we incorporate knowledge about students’ changing needs for 
scaffolding in how we develop scaffolded items/tasks? 

• How do we determine the effects of scaffolds on the difficulty of 
items/tasks? 

• What scoring models can provide a basis for appropriate 
inferences about students’ KSAs based on TE/UDAs that 
include scaffolded items? Can partial credit models be used to 
appropriately score scaffolded items/ tasks or are there models 
other than those currently used in educational measurement 
that are more appropriate for scaffolded items? 
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We do not know the best ways to incorporate the use of access features, 
alternative representations and pathways, and scaffolds into technology- 
enhanced assessments. For example, should students determine whether 
to use the tools, should teachers select the appropriate tools for students, 
or should there be a mixture of ways to provide the tools? It is critical that 
students and teachers know how to take advantage of these features in a 
way that supports valid inferences. Some areas of research are as follows: 

• How can students be taught to choose access strategies and 
features so that they are afforded the greatest opportunity to 
demonstrate their knowledge, skills, and abilities? 

• If teachers select features for students, can selection procedures 
used for students with disabilities be generalized for use with 
students without disabilities? 

• How can students be taught to decide whether to use scaffolding 
or to attempt an item/ task without scaffolding? Can we detect 
overuse or unnecessary use of construct-related supports? 

While all the research topics discussed above have a bearing on the 
validity of inferences from TE/UDAs, research focusing directly on validity 
and related technical characteristics is needed. Issues of validity, reliability, 
and fairness may be different in TE/UDAs that include universally avail- 
able access features from those that arise from the use of accommodations 
in paper- and-pencil testing, and the use of scaffolding or branching items 
adds another dimension to these issues. Research questions are as follows: 

• What is a framework that can be used to support the validity of 
inferences from access-based TEAs? What specific components 
should be included in that framework and what questions should 
be answered? How can that framework be applied in test design, 
development, implementation, and interpretation? 

• What research designs other than the “differential effect” 
model for evaluating accommodations can be developed for 
investigating access-based TEAs? These designs should be 
sensitive to the effects of ancillary requirements on students, 
whether they have a disability or not. 

• How do universally accessible tools affect the constructs 
assessed? Are the constructs comparable for students who do 
and do not use the features? Is this equivalence mediated by 
disability status? 

• To what extent do access-based TEAs provide opportunity to 
perform for students with disabilities, facilitating access as 
intended? 
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• What are the effects of specific access features, representation 
options, and scaffolding approaches on student test scores? 
Do these effects differ as a function of student or group 
characteristics, or disability status? 


Conclusions 

At this writing, states are being encouraged by the U.S. Department of 
Education to develop and implement innovative assessments through var- 
ious competitive grant programs, including the Race to the Top competi- 
tion, Enhanced Assessment Grants, and the Investing in Innovation Fund 
Grants. The administration, in its Blueprint for Reform (U.S. Department 
of Education, March 2010), calls for “new assessment systems [that] will 
better capture higher-order skills, provide more accurate measures of stu- 
dent growth, and better inform classroom instruction to respond to aca- 
demic needs” (p. 4). The goals for reauthorization of the Elementary and 
Secondary Education Act (U.S. Department of Education, March 2010) 
continue to emphasize the importance of educating all students well and 
accounting for all students’ learning. The possibility of improving what we 
assess and how we assess it with technology is real. 

As evidenced by the discussion of the state of the art in technology- 
enabled assessment and the issues that should be addressed as we take 
advantage of technology in our assessment programs, a principled program 
of research is needed to properly develop and use technology-enabled uni- 
versally designed assessments. As the research progresses, we can incor- 
porate what we have learned to build TE/UDAs, with the understanding 
that they will continue to improve in how they assess students and how 
they provide access to students, particularly students with disabilities. 

The program of research will need to address the four major themes 
in this paper — the use of a construct-centered approach in designing 
accessible tests, the incorporation of access features into the test delivery 
system, the use of scaffolding in designing items and tests, and the need 
for a well-structured validation framework — as it examines increasing 
access to assessments that are likely to measure learning in ways that are 
not possible through paper-and-pencil tests. ^ Research designs will need 
to be carefully thought out first, through the generation of clear and tar- 
geted research questions, so that the effects of providing access features 
and the effects of changing what is being measured can be estimated. At 
some point, it may be that access features are so thoroughly understood 
and commonly used that building accessible assessments is second nature; 
until then, there will be a need to show whether the tools corrupt or clarify 
the targeted measure. 
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As part of this carefully considered effort, researchers will need to con- 
sider alternatives to the prevalent paradigm in accommodations research, 
one that tests the “interaction hypothesis”; if the accommodation improves 
the scores for students with disabilities more than it improves scores 
for students without disabilities, then there is support that the accom- 
modation is a “valid” one. The interaction hypothesis as a way to study 
the effects of accommodations has been losing support in the field.® One 
reason for this is that there can be a number of interpretations of the same 
result from studies relying on demonstrating an interaction effect. For 
example, it could be that the accommodation under study appropriately 
removes a construct-irrelevant access barrier for some students with dis- 
abilities and removes the same barrier for some students without disabili- 
ties. The results might show that both groups, on average, increased their 
scores equally on the accommodated version of the test, with the infer- 
ence, according to the interaction hypothesis, that the accommodation is 
not valid - for an accommodation to produce valid scores, the interaction 
hypothesis requires the accommodation to increase the scores of students 
with disabilities more than the scores of students without disabilities. In 
this case, however, the accommodation is providing for better (more valid) 
scores for students in both groups, rather than reducing the validity of 
scores for students with disabilities (by providing an unfair advantage). 

The interaction hypothesis may not adequately consider the interac- 
tion between student characteristics and the features of the test itself. 
Discussions of the four themes in this white paper call for researchers to 
tease out the relative importance of the interaction between the needs of 
students with disabilities and the cognitive demands of the test and item 
features within the context of TE/UDA approaches. When individual test 
takers encounter test items, the research needs to consider under what 
conditions target skills are properly conveyed in this interaction and, in 
particular, when communication about targeted information becomes sys- 
tematically contaminated, misunderstood, or distorted. Research needs to 
examine error, which occurs in regular and predictable ways when indi- 
viduals with specific characteristics interact with specific task factors. 
The error, which influences task performance but is not part of what one 
intends to measure, can influence results and interfere with accurately 
measuring targeted knowledge and skills and in interpreting scores. Based 
on the major themes in this paper, it is possible to envision circumstances 
in which the following occurs: 

• Test results do not reflect student knowledge, skills, and abilities 
but rather ancillary KSAs that interfere with measuring targeted 
KSAs. 

• Access features embedded into a TE/UDA testing platform are 
available to a student with disabilities who needs them but are 
not used by the student. 


I-T-L-A 


Technology-Enabled and Universally Designed Assessment 


• Scaffolding alters the targeted construct and therefore the 
inferences that can he made. 

Research comparing scores of students with and without disabilities 
designed around the interaction hypothesis may not detect subtleties of 
the student/test interaction. For example, there may be no interaction 
effect found in the following circumstances in which student scores are 
compared with and without a read- aloud accommodation: 

• Some students with disabilities do not need the accommodation 
to access the test so their scores do not improve with the read- 
aloud provision. 

• Some students without disabilities who are poor readers benefit 
from the accommodation so their scores improve with the read- 
aloud provision. 

In this case, the read-aloud accommodation appropriately affected 
student performance, but it is not apparent from the mean scores of the 
groups. In other cases, there may be appear to be an interaction but it does 
not reflect a real effect, as in the following example: 

• Students with disabilities use the accommodation and their test 
scores improve. 

• Students without disabilities pay no attention to the 
accommodation provided and their test scores do not change. 

In this case, mean scores show that the accommodation affected scores 
of students with disabilities and did not affect scores of students without 
disabilities, but these means differ because of poor implementation of the 
accommodation (students without disabilities did not use it when pro- 
vided). 

The National Research Council report (2004) puts it this way: “For the 
most part, existing research has investigated the effects of accommodations 
on test performance but is not informative about the validity of inferences 
[emphasis added] based on scores from accommodated administrations” 
(p. 101). TE/UDAs have potential to present students with opportunities 
to fully demonstrate what they know and can do, so that results of their 
performance are valid and can be used to effectively guide instruction that 
supports students’ development of deeper levels of knowledge and under- 
standing, and greater complexity of skills. We can no longer delay the sys- 
tematic examination of tools such as TE/UDAs that hold great promise to 
support our efforts to fully include students and effectively facilitate their 
learning and achievement. 
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Endnotes 

1. Measured Progress was the testing vendor for Kentucky. More information is 
available at http://www.measuredprogress.org/. 

2. Kurzweil (http://www.kurzweiledu.com/) offers a comprehensive reading, writing 
and learning software solution for any struggling reader, including individuals 
with learning difficulties commonly used in K-12 schools. 

3. The authors recognize that “scaffolds” and “scaffolding” are used in the cognitive 
psychology and instructional literature to refer to a variety of concepts and 
techniques, sometimes narrowly defined and sometimes broadly. We use the term 
in discussing assessment because it has already been applied in an assessment 
context (e.g.. Hall, Strangman, & Meyer, 2009), and we hope to link to the current 
applications of what is being called scaffolding in testing situations. 

4. In current state assessment terminology, such changes are referred to as 
“modifications” to the test. 

5. We recognize that this is a gross oversimplification of how CAT models actually 
work. 

6. Wiliam and Thompson (2007, cited in Wiliam, 2007) typify short-cycle formative 
assessments as those used within and between lessons and medium-cycle formative 
assessments as those used within and between units. 

7. See the white paper by Bechard, Sheinker, Abell, Barton, Blackorby, Burling, 
Camacho, Cameto, Haertel, Hansen, Johnstone, Kingston, Murray, Parker, Redfield, 
Rodriquez, & Tucker (2010) for a full discussion about how what is being measured 
is changing, including a discussion of learning progressions. 

8. For detailed information, see Sireci, Li, and Scarpati (2003) (specifically p. 48 and 
pp. 60-63) and the National Research Council (2004) (specifically pp. 87-88, p. 96, 
andp. 101). 
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